class: title-slide title-rmarkdown center middle # Advanced R for Econometricians ## Reproducible Research with R Markdown --- class: left, top ## R Markdown R Markdown provides a framework for combining code, results, and commentary. You can use R Markdown - to generate reports, articles, books ... - as an environment in which to do data science, as a lab notebook where you can capture not only what you did, but also what you were thinking. Some interesting links: - [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/xaringan-format.html) - [Introduction to R Markdown](https://rmarkdown.rstudio.com/lesson-1.html) - [Cheatsheet](https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf) Example: - [Introduction to Econometrics with R](https://www.econometrics-with-r.org) --- class: left, top ## What’s possible with R Markdown? .center[ <iframe src="https://player.vimeo.com/video/178485416?color=428bca&title=0&byline=0&portrait=0" width="720" height="480" frameborder="0" allow="autoplay; fullscreen" allowfullscreen></iframe> <p><a href="https://vimeo.com/178485416">What is R Markdown?</a> from <a href="https://vimeo.com/rstudioinc">RStudio, Inc.</a> on <a href="https://vimeo.com">Vimeo</a>.</p> ] --- class: left, top ## Basic example .code70[ ```` --- title: "Diamond sizes" date: 2016-08-25 output: html_document --- ```{r, include=FALSE} library(tidyverse) data("diamonds") smaller <- diamonds %>% filter(carat < 2.5) ``` We have data about `r nrow(diamonds)` diamonds. Only `r nrow(diamonds) - nrow(smaller)` are **larger** than 2.5 *carats*. The distribution of the remainder is shown below: ```{r, echo = FALSE} smaller %>% ggplot(aes(carat)) + geom_freqpoly(binwidth = 0.01) ``` If $\frac{\alpha}{\beta} = \gamma$ then $\alpha = \gamma\beta$. ```` ] - If this is stored as `example.rmd` in your working directory run `knitr::render(example.rmd)` to create an output document. Alternatively, hit the <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADsAAAAaCAYAAAAJ1SQgAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAASmSURBVFhH7ZdpT1RXGID7B/qhTWs/NLG0aaNiKVrKMuudFUaYpbMUENlkG0YYEFtB1DqoqAUTXAYrtFhwC1UU20igSRsspKlNUyXhg0nTT3yof4Af8PTMnaIzzMVOVcSoN3mSuee899zz3Pds85LBYOB54YXss8oL2cdBe3s7J06cUKxbKZZN9tq1a9y5c4fGxkYkSVKMedIsi+zmzZsZHx/n7t93uXXrFhkZGYpxT5rHLltWVsbo6CjT09N4PB5SU1PRaDSKsU+ah5I1Go2K5T6fTxa9ffs2JpMJvV6vGLdSJC1rtlip2xFiS+txStpP4d7eTWnLQZzeYrk+MnRHRkaYmprC6/U+daIRkpJ1ekuo2jOCueQCueXn8TVfxtX8Lc6m8xQEz7A10Mbw8DCzs7O43W50Op1iOyvNf8raXZVYis/Q2D1N57lZPu39nW3dPxPoHMdWPYTRfZKLl8a4efMmTqczedGJOZifIRxTFimCOSZiypSZEFEickKpbmkeKGuxWqkNXaGgagSD7xxe/ygtXb8QPDpFSdt3ouxLBs5+L6+4pQ1tWD1bFNtRZJFseGY+QT5pwjPMJ/GRHihb3XCImo6f6Ln8F0fO/0nnwAxZtj7SpGOsUR1hV6iPyclJzPmVpGR9hj7QLRYvk2JbCcTKyilNJqNL8KiyRrGaVnVcwSbmqdY+gLX4Air719Ttu0Gw+1dc9cNUNR1Da2smu7iHD4u7WF94AHOBL6EtRRZkFTsaJpLouYnocI1e88yEF9eL39Gxf/+am4hpJx5FWclgJDfPTmaukHENEjh4g1DvDO7aK2y0hnHWXaLuix+p6BzDFhxinfkAb6a3sSq1BZXBk9CeIrLsvFCIlVggKhObbdnp3jCPkY3cP0xmc3Q+Nqj9pKmCpGt2sF5zAI3rFL7AVfz7J9nb+weNR3+jZNcYxvIhVL4wqdYjvKvv4C31Hl7f0IzK4o17wZJEex/xFQ7hRfWLZCLECT2CrCSZ+EC1jfU5rUK4EI2Uj1ayYw/0k1vWR13HD9hrh8lxnEbt6ce6dQhzzRC2hkFM2/rJrujhndzdrMppQmN1K74ogYVh/O/KGi+8jLJ6ycq6nDbS1fVxlYFdxylqvUp56Drl+65TGRqjuG1U7K0X0VT0s9HTQ4p6L6+tbeLllFpeTa3BbLLFtbEk92TFb7mzsXLLKBtBo3eIIVzPBpWfbG2h+AAmsfXk4vR/g750AM0nfajFVpPl6SXDcYy0vMOsMe/nPcs+Vlt284Z5J/rCIFZx0op9wZLEyi7ciy5H5+8yy0aQpEiWLWxUV5OlLZLLiipbMNUOkt94FnvTILqa01gavsIcPE1meQ9v54V4Ja2BlMwKcR42L3luTmCxrEDea+UM/09Zgfytog/ff2YRsqxW2iQya0etd/GRppR0MX8lIR2ps5gtFFXvpCDQh6q8l6wisc24u0hzHGLNpg5Wm3ay1lSNZEly+K4gsuza7M8Fe3k/ZzuZ2i1i60k8GJiF9NbWLj7ePYCu/iQ6/3GM/sMY8l0JsU8rsmxkNV5coUTkb1teXh4Oh0PGKo6TSnFPK3Fz9lnnOZI18A/sgKRuNgdCSQAAAABJRU5ErkJggg==" width="79" /> button in Rstudio. --- class: left, top ## How it works <br> <br> An R Markdown document is a basic text file, with the (conventional) extension `.Rmd`. The example already contains the most important components: - An (optional) YAML header surrounded by --- - Chunks of R code surrounded by ` ```{r, ...} ` and ` ``` ` - Inline R code surrounded by ` `r ` and ` ` ` - Text mixed with simple text formatting like **bold** and _italics_ - Mathematical expressions --- class: left, top ## How it works <br> <br> When we click **knit** in RStudio then 1. the `knitr` package executes the code chunks and creates a markdown document (`.md`) which contains the text we have written and optionally the code and its output 2. the `.md` file gets processed by `pandoc` which creates the final file in the desired output format (e.g. `html`, `pdf`, `word`, ...) .center[  ] --- class: left, top ## The YAML header The YAML header contains settings passed to pandoc and the rendering functions. Here you can: - specify the output format with optional output specific options - provide information for the title page such as title, name, and date - include external files (e.g. `.css`) or additional packages for `\(\LaTeX\)` - define parameters. Basic YAML header using mostly defaults ```yaml --- title: Habits author: John Doe date: March 22, 2005 output: html_document --- ``` --- class: left, top ## The YAML header Example of a bit more complicated YAML header. ```nohighlights --- title: Habits author: John Doe date: March 22, 2005 output: html_document: toc: true toc_float: collapsed: false smooth_scroll: false css: my_style.css params: pi: 3.141593 --- ``` --- class: left, top ## Output Formats - [`html_notebook`](https://bookdown.org/yihui/rmarkdown/notebook.html) - Interactive R Notebooks - [`html_document`](https://bookdown.org/yihui/rmarkdown/html-document.html) - HTML document - [`pdf_document`](https://bookdown.org/yihui/rmarkdown/pdf-document.html) - PDF document - [`word_document`](https://bookdown.org/yihui/rmarkdown/word-document.html) - Microsoft Word document Learn more about further output formats at [https://rmarkdown.rstudio.com](https://rmarkdown.rstudio.com/lesson-9.html). --- class: left, top ## Text Formatting - R Markdown is built on [Pandoc Markdown](https://rmarkdown.rstudio.com/authoring_pandoc_markdown.html%23raw-tex#pandoc_markdown), which in turn is a flavour of the markup language Markdown. - The advantage of Markdown compared to other markup languages such as HTML and `\(\LaTeX\)` is its simplicity. --- class: left, top ## Text Formatting Some examples: .pull-left[ - Headers ```markdown # Header 1 ## Header 2 ### Header 3 ``` .scaled[ # Header 1 ## Header 2 ### Header 3 ] ] .pull-right[ - Inline formatting ```markdown *italics* **bold** This ~~is deleted text.~~ Some code: `lm(y ~ x)`. ``` *italics* **bold** This ~~is deleted text.~~ Some code: `lm(y ~ x)`. ] --- class: left, top ## Text Formatting .pull-left[ - Unordered Lists ```markdown * Item - Item belonging to item - Another item belonging to item * Item * Item ``` * Item - Item belonging to item - Another item belonging to item * Item * Item ] .pull-right[ - Ordered Lists ```markdown 1. First item - First subitem - Second subitem 2. Second item 3. Third item ``` 1. First item - First subitem - Second subitem 2. Second item 3. Third item ] Note that Markdown does not support nested ordered lists with numbering 1.1, 1.2.1 and so on. For this, you would have to fall back to HTML or `\(\LaTeX\)`. --- class: left, top ## Text Formatting <br> .pull-left[ - Linebreaks ```markdown First line Second line ``` First line Second line <br> <br> A new line is started by two whitespaces at the end of the previous line (which you cannot see here). ] .pull-right[ - Paragraphs ```markdown First paragraph Second paragraph ``` First paragraph Second paragraph <br> A new paragraph is started by a blank line. ] --- class: left, top ## Math Expressions When rendering to a pdf `\(\LaTeX{}\)` can be used throughout the document. Through `MathJax` the `\(\LaTeX{}\)` math mode can also be used for HTML output as follows: - Inline `\(\LaTeX{}\)` equations can be written within dollar signs. ``` This is an inline equation $f(x)=\frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{1}{2}x^2\right)}$. ``` This is an inline equation `\(f(x)=\frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{1}{2}x^2\right)}\)`. - Display style `\(\LaTeX{}\)` equations can be written within double dollar signs. ``` This is display style $$f(x)=\frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{1}{2}x^2\right)}$$. ``` This is display style `$$f(x)=\frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{1}{2}x^2\right)}$$` --- class: left, top ## Math Expressions - Even more complex math environments such as `align` can be used ``` \begin{align} (a+b)^{3} &=(a+b)(a+b)^{2} \\ &=(a+b)\left(a^{2}+2 a b+b^{2}\right) \\ &=a^{3}+3 a^{2} b+3 a b^{2}+b^{3} \end{align} ``` <br> \\begin{align} (a+b)^{3} &=(a+b)(a+b)^{2} \\\\ &=(a+b)\\left(a^{2}+2 a b+b^{2}\\right) \\\\ &=a^{3}+3 a^{2} b+3 a b^{2}+b^{3} \\end{align} --- class: left, top ## Code Chunks .font90[ - You can write arbitrary R code in a code chunk (e.g. run a regression and produce a plot of the result). - The code, the result of the code or both can be included into the final document. - In the top curly braces, chunk options can be set to control how the output is handled. Some useful options: - `eval`(`TRUE`; logical): whether to evaluate the code chunk - `echo`(`TRUE`; logical or numeric): whether to include R source code in the output file - `cache`(`FALSE`; logical): whether to cache a code chunk - `warning` (`TRUE`; logical): whether to preserve warnings - `message` (`TRUE`; logical): whether to preserve messages - `include` (`TRUE`; logical): whether to include the chunk output in the final output document For a complete list, we refer to the `knitr` documentation: [Chunk options](https://yihui.name/knitr/options#code-evaluation). - Press `Ctrl + Alt + I` to insert a new R code chunk. ] --- class: left, top ## Code Chunks ````markdown ```{r lin_reg, echo=TRUE, cache = TRUE} x <- runif(100) y <- 0.4 * x + rnorm(100) lm(y ~ x) ``` ```` If you mainly use chunk options which are not the default, then it saves time to change the default for the document with `knitr::opts_chunk$set()`. ````markdown ```{r setup, include=FALSE} knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE) ``` ```` --- class: left, top ## Inline R Code You can also use R in your running text. When something changes (e.g. the input data) the numbers in the text adjust accordingly as well. ````markdown ```{r, echo=FALSE} x <- 4 y <- 38 answer <- x + y ``` The answer is `r answer` ```` The answer is 42. --- class: left, top ## Tables <br> <br> Tables can be created as follows: <br> <br> .pull-left[ ```markdown | Right | Left | Default | Center | |------:|:-----|---------|:------:| | 12 | 12 | 12 | 12 | | 123 | 123 | 123 | 123 | | 1 | 1 | 1 | 1 | ``` ] .pull-right[ | Right | Left | Default | Center | |------:|:-----|---------|:------:| | 12 | 12 | 12 | 12 | | 123 | 123 | 123 | 123 | | 1 | 1 | 1 | 1 | ] <br> <br> In general, however, tables should be produced by your code. The following slides show some examples. --- class: left, top ## `kable` - `knitr::kable()` is probably the easiest way to transform a matrix or a data frame into a table. ````markdown ```{r} result <- data.frame(rbind(rep(12,4), rep(123, 4), rep(1,4))) names(result) <- c("Right", "Left", "Default", "Center") knitr::kable(result, align = c("r", "l", "l", "c"), format = "html") ``` ```` <table> <thead> <tr> <th style="text-align:right;"> Right </th> <th style="text-align:left;"> Left </th> <th style="text-align:left;"> Default </th> <th style="text-align:center;"> Center </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> 12 </td> <td style="text-align:left;"> 12 </td> <td style="text-align:center;"> 12 </td> </tr> <tr> <td style="text-align:right;"> 123 </td> <td style="text-align:left;"> 123 </td> <td style="text-align:left;"> 123 </td> <td style="text-align:center;"> 123 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:center;"> 1 </td> </tr> </tbody> </table> --- class: left, top ## `kableExtra` - The appearance of a table (not only produced by `knitr::kable()`) depends on the underlying `css` (in case of an HTML document) or the generated `\(\LaTeX{}\)` code. - `kableExtra` allows to format your `knitr::kable()` table without the use of `css` or `\(\LaTeX{}\)` ````markdown ```{r} library(magrittr) knitr::kable(result, align = c("r", "l", "l", "c"), format = "html") %>% kableExtra::kable_styling(bootstrap_options = "bordered") ``` ```` - Have a look at the [kableExtra vignette](https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html) for examples. --- class: left, top ## `xtable` - `xtable` provides more functionality than kable but is a bit more difficult to use. - If your code produces raw HTML or `\(\LaTeX{}\)` e.g. for a table then you have to set `results='asis'` in the chunk options, otherwise it gets formatted and will not work as intended. ````markdown ```{r, echo=FALSE, results='asis'} library(xtable) my_xtable <- xtable(result, align = c("r", "r", "l", "l", "c")) print.xtable(my_xtable, type = "html", include.rownames = FALSE) ``` ```` <!-- html table generated in R 3.6.0 by xtable 1.8-4 package --> <!-- Thu Oct 24 11:00:44 2019 --> <table border=1> <tr> <th> Right </th> <th> Left </th> <th> Default </th> <th> Center </th> </tr> <tr> <td align="right"> 12.00 </td> <td> 12.00 </td> <td> 12.00 </td> <td align="center"> 12.00 </td> </tr> <tr> <td align="right"> 123.00 </td> <td> 123.00 </td> <td> 123.00 </td> <td align="center"> 123.00 </td> </tr> <tr> <td align="right"> 1.00 </td> <td> 1.00 </td> <td> 1.00 </td> <td align="center"> 1.00 </td> </tr> </table> --- class: left, top ## `stargazer` - `stargazer` takes model objects as input and automatically creates an output table .pull-left[ <br> ````markdown ```{r, results='asis', echo=TRUE} linear_model <- lm(mpg ~ cyl + wt, data = mtcars) stargazer::stargazer(linear_model, type = "html", out.header = TRUE, omit.table.layout = "sn") ``` ```` ] .pull-right[ <br> .left[ .font80[ <table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td>mpg</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">cyl</td><td>-1.508<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(0.415)</td></tr> <tr><td style="text-align:left"></td><td></td></tr> <tr><td style="text-align:left">wt</td><td>-3.191<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(0.757)</td></tr> <tr><td style="text-align:left"></td><td></td></tr> <tr><td style="text-align:left">Constant</td><td>39.686<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(1.715)</td></tr> <tr><td style="text-align:left"></td><td></td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr></table> ]]] --- class: left, top ## Plots ````markdown ```{r, fig.height = 5} hist(rnorm(1000)) ``` ```` <!-- --> --- class: left, top ## Links and External Graphics - Links can be created as follows: ```markdown [This is a link to the R Markdown book](https://bookdown.org/yihui/rmarkdown/) ``` - To include figures saved on your computer use ```markdown  ``` - If you need more control over the appearence of the figure you can use ````markdown ```{r, out.width = "500px", fig.align='center'} knitr::include_graphics("path/to/figure.png") ``` ```` --- class: left, top ## Using HTML and `\(\LaTeX{}\)` - If only R Markdown is used the document can be compiled without problems to all supported output formats. - Sometimes R Markdown is too limited and one would like to use the more powerful tools provided by HTML or `\(\LaTeX{}\)`. - If the output format is HTML, you can also use HTML instead of R Markdown or mix both. - If the output format is PDF, the same can be said about `\(\LaTeX{}\)` --- class: left, top ## Interactive documents - If the output is HTML, interactive elements (usually some JavaScript) can be included. - There are R packages producing interactive output e.g. `plotly` for interactive plots or `leaflet` for interactive maps. ````markdown ```{r} library(plotly) mtcars$am[which(mtcars$am == 0)] <- 'Automatic' mtcars$am[which(mtcars$am == 1)] <- 'Manual' mtcars$am <- as.factor(mtcars$am) plot_ly(mtcars, x = ~wt, y = ~hp, z = ~qsec, color = ~am, colors = c('#BF382A', '#0C4B8E')) %>% add_markers() %>% layout(scene = list(xaxis = list(title = 'Weight'), yaxis = list(title = 'Gross horsepower'), zaxis = list(title = '1/4 mile time'))) ``` ```` --- class: left, top ## Interactive documents
- For more examples see [plot.ly](https://plot.ly/r/). --- class: left, top ## Interactive documents - An interactive map. ````markdown ```{r, out.width='100%'} library(leaflet) leaflet() %>% addTiles() %>% setView(7.005070, 51.463675,zoom = 17) %>% addMarkers( 7.005070, 51.463675 ) ``` ```` - Learn more at [Leaflet for R](https://rstudio.github.io/leaflet). --- class: left, top ## Interactive documents
--- class: left, top ## Interactive documents - The `DT` package also allows us to include interactive tables. ````markdown ```{r} DT::datatable( head(iris, 10), fillContainer = FALSE, options = list(pageLength = 8) ) ``` ````
--- class: left, top ## Advanced Topics: Hooks - You can define your own chunk options using hooks. This is a topic for you if you want to become a professional `knitr` user. #### Chunk Hooks - A chunk hook is a function defining what should happen before or after a code chunk. .code70[ ```r # Template for a chunk hook knitr::knit_hooks$set( foo_hook = function(before, options, envir) { if (before) { ## code to be run before a chunk } else { ## code to be run after a chunk } }) ``` ] - Before the chunk is evaluated the function is called with `before == TRUE` and after with `before == FALSE`. - `options` is a list with the current chunk options (optional). - `envir` is the enviornment in which the the code chunk is evaluated (optional). --- class: left, top ## Chunk Hooks: Example - To reduce the margins of a plot with chunk option `small.mar = TRUE` put the following code chunk at the top of your document. ````markdown ```{r, eval = FALSE, echo=TRUE} knitr::knit_hooks$set(small.mar = function(before) { if (before) par(mar = c(1, 1, 1, 1)) }) ``` ```` - `small.mar` now can be used as other chunk options. ````markdown ```{r, small.mar = TRUE} hist(rnorm) ``` ```` --- class: left, top ## Output Hooks - Output hooks are (render) functions that define how source code and its output is presented. - There are 8 output hooks to deal with differnt types of output: - `source`: the source code - `output`: what would have been printed in an R terminal except warnings, messages and errors. - `warnings`: warnings from `warning()` - `message`: messages from `message()` - `error`: errors from `stop()` - `plot`: graphics output - `inline`: output of inline R code - `chunk`: all the output of a chunk - `document`: the output of the whole document - There is for each of the above a default which you can adjust to your needs. --- class: left, top ## Output Hooks: Example - You can e.g. use a hook to consistantly style all your tables using `kableExtra`. ```r default_source_hook <- knit_hooks$get("source") knit_hooks$set( source = function(x, options) { if(is.null(options$table)) default_source_hook(x, options) else { eval(parse(text = x)) %>% kable("html") %>% kable_styling("hover", full_width = F) } } ) ``` --- class: left, top ## The `here` package - `here::here()` is returning the path to the directory where your project file lives in. Why is this useful? - If an Rmarkdown file is compiled it sets the working directory to the folder where your `.rmd` file is stored. - Usually you do some interactive work before compiling. If you work in an project then the working directory for the interactive part is set to the path of your project file. - If your project becomes more complex and you are using subdirectories the working directory set for compilation and interactive work are not necessary the same. This is quite annoying. #### Solution - use `here::here()` when setting a path. ```r load(here::here("Data", "my_dataset.rda")) ``` --- class: left, top ## Exercises 1. Create an R Markdown document that is compiled to an HTML document. The document should consist of two sections: - Crime Incidents Report Data - Explorative Data Analysis. 2. Use the first section to read in the "Crime Incident Reports" available under [Boston crime](https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system). Add some information about the source of the data and list all available columns with a short description Hint: Take a look at the table `RMS_Crime_Incident_Field_Explanation`. 3. Optimize your code chunks so that the document doesn't load the data every time you compile. 4. Use the second section for an EDA. You are free to do whatever you like but produce at least one table and one graphic. 5. Hide all your code chunks by setting a global option. 6. Create a PDF document containing what you did. For this `\(\LaTeX{}\)` needs to be installed. --- class: left, top ## Exercises <ol start="6"> <li> Remember that nested ordered lists are not supported by Rmarkdown. Produce one nonetheless in your HTML and your PDF document. </li> <li> Also build some slides using the code you have written. You are free to choose one among all available presentation formats. </li> <li> Find out how to always use the current date in the YAML header.</li> </ol>